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Abstract 

To  achieve  accurate  registration,  the  transformations 
which  locate  the  tracking  system  components  with  respect 
to  the  environment  must  be  known.  These  transformations 
relate  the  base  of  the  tracking  system  to  the  virtual  world 
and  the  tracking  system’s  sensor  to  the  graphics  display.  In 
this  paper  we  present  a  unified,  general  calibration  method 
for  calculating  these  transformations.  A  user  is  asked  to 
align  the  display  with  objects  in  the  real  world.  Using  this 
method,  the  sensor  to  display  and  tracker  base  to  world 
transformations  can  be  determined  with  as  few  as  three 
measurements. 


1.  Introduction 

Almost  all  Augmented  Reality  (AR)  systems  use  a  track¬ 
ing  system  to  capture  motion  of  objects  in  the  real  world  and 
map  them  into  the  computer  generated  environment.  The 
most  important  relationship  is  head  tracking  —  whenever 
the  user  moves  their  head  in  the  “real  world”,  the  viewpoint 
in  the  graphics  system  should  move  accordingly.  Similarly, 
if  tracked  props  or  interaction  devices  are  moved  in  the  real 
world,  their  movements  should  follow  accordingly. 

However,  registration  errors  are  the  result  of  three  error 
sources: 

1 .  Tracking  errors.  These  occur  when  the  measurement 
returned  by  the  tracker  does  not  agree  with  the  real 
pose  of  the  tracker. 

2.  Display  calibration  errors.  These  arise  when  the  op¬ 
tical  characteristics  of  the  display  are  unknown.  It  in¬ 
cludes  parameters  such  as  field-of-view,  distortion  and 
centre  of  projection.  Although  these  parameters  can 
vary  (for  example,  a  camera  with  a  zoom  lens),  in 
many  applications  these  parameters  are  constant  and 
generally  it  sufficient  to  calibrate  the  display  once. 
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3.  Tracker  alignment  errors.  The  sensor  measurements 
must  be  transformed  so  that  the  graphics  are  rendered 
in  the  display  at  the  correct  viewpoint.  The  transforma¬ 
tion  requires  the  knowledge  of  the  world-to-base  trans¬ 
formation  (where  is  the  origin  of  the  tracker  coordinate 
system  in  the  world?)  and  the  sensor-to-manipulator 
transformation  (how  is  the  sensor  placed  relative  to 
the  display?)  Although  these  parameters  tend  to  stay 
constant  with  time,  they  can  vary  when  (1)  the  tracker 
base  is  moved  (e.g.  the  magnetic  emiter  of  a  magnetic 
tracker  is  moved)  (2)  the  sensor  is  moved  on  the  HMD 
(e.g.  relocated  on  the  HMD  or  HMD’s  headband  ad¬ 
justed  ). 

The  alignment  problem  we  are  concerned  with  is  to  de¬ 
termine  where  the  base  is  with  respect  to  the  origin  of  the 
virtual  world  and  where  the  manipulator  is  with  respect 
to  the  sensor  to  which  it  is  attached.  Many  authors  have 
considered  the  problems  of  tracking  errors  and  display  er¬ 
rors  [2, 4, 6-8].  However  relatively  few  authors  have  con¬ 
sidered  the  problem  of  tackling  the  alignment  errors.  There 
are  many  strategies  that  can  be  used  to  find  the  correct  align¬ 
ment  of  the  tracker  but  there  are  no  unified  method  that  can 
be  used  for  any  tracking  system. 

This  is  a  surprisingly  difficult  problem  for  prototyping 
and  developing  AR  systems.  In  many  systems,  a  flexible 
display  such  as  the  Sony  Glasstron  allow  the  user  to  fit  it 
correctly  to  her  head.  However,  every  time  an  adjustment 
is  done  the  transform  between  the  display  and  the  sensor  is 
changing.  In  addition  it  might  be  necessary  to  change  the 
location  of  the  sensor  on  the  headband,  which  pose  each 
time  the  problem  of  locating  the  new  sensor  with  respect  to 
the  display. 

These  problems  are  exacerbated  if  multiple  sensing  sys¬ 
tems  are  used  to  form  hybrid  trackers.  One  example  is  the 
mobile  AR  system  which  is  shown  in  Figure  1.  This  system 
uses  multiple  tracking  devices  to  track  position  and  orien¬ 
tation  of  the  user’s  head.  These  devices  are  not  referenced 
the  same  way  and  the  relative  attitudes  between  each  others 
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Figure  1.  The  problems  of  tracker  alignment 
are  exacerbated  when  multiple  tracking  sys¬ 
tems  are  used  simultaneously.  This  mobile 
system  uses  an  inertial  navigation  system 
and  a  GPS. 


and  with  the  display  attitude  are  not  known.  In  addition,  the 
use  of  an  inertial  sensor  stabilized  by  compass  and  a  GPS  re¬ 
quire  the  knowledge  of  the  frame  of  references  within  which 
these  sensors  give  their  measurement.  Similar  problems  oc¬ 
cur  in  hybrid  tracking  systems  where  the  observations  from 
multiple  sensors  are  fused  together  in  a  central  estimation 
algorithm  such  as  a  Kalman  filter  [15]. 

Early  systems,  such  as  those  described  in  [1],  often  used 
open-loop  calibration.  That  is,  they  essentially  had  to  trust 
the  measurement  of  the  location  of  the  tracking  system  in 
the  environment  and  on  the  HMD.  Bajura,  for  example, 
formed  a  closed-loop  system  with  a  video  see-through  sys¬ 
tem  by  using  the  HMD  camera  to  provide  direct  feedback 
for  the  registration  [4].  Although  this  approach  solved  the 
sensor  to  HMD  transform,  it  did  not  solve  the  tracker  emit¬ 
ter  to  world  transform.  Furthermore,  the  approach  explicitly 
assumed  a  video  see-through  display  and  cannot  be  used 
for  optical  see-through  display  systems.  To  calibrate  its 
set  of  trackers  (Flock  of  Birds,  Faro  mechanical  arm,  and 
video  cameras),  the  UNC  ultrasound  system  exploited  the 
fact  that  the  regions  of  each  device  overlapped  [13].  How¬ 
ever,  this  configuration  is  highly  specialized.  Tuceyran  pro¬ 
posed  a  calibration  method  for  getting  the  unknown  rigid 
transforms  in  the  GRASP  system  [11].  While  the  method 
was  presented  as  being  applicable  to  any  AR  system,  it  uses 
a  video  see-through  setup  and  a  tracked  pointer  as  part  of 
the  calibration  procedure,  however  this  is  not  the  configura¬ 
tion  of  every  AR  systems.  One  means  of  calibrating  multi¬ 
ple  trackers  was  by  basically  letting  the  trackers  “overlap” 
their  operating  region.  Kutulakos  and  Vallino  [10]  used  a 
projective  world  and  markers  to  align  the  virtual  objects 


but  this  setup  required  a  video  see-through  setup  to  work. 
In  the  general  case  when  a  video  see-through  system  is 
used,  there  is  no  need  to  know  the  location  of  the  tracker 
components  in  the  world  because  the  sensor  is  collocated 
with  the  display  and  the  world  is  collocated  with  the  pattern 
tracked.  Fuhrmann  [6]  described  a  method  for  fast  calibra¬ 
tion  in  AR,  but  the  description  of  the  method  is  very  suc¬ 
cinct  and  therefore  difficult  to  reproduce  and  it  is  not  clear 
that  the  transform  we  are  concerned  with  are  actually  deter¬ 
mined.  Tuceryan  introduced  the  Single  Point  Active  Align¬ 
ment  Method  (SPAAM)  [7, 14]  to  perform  the  calibration 
of  an  optical  see-through  HMD.  However,  this  method  re¬ 
quired  many  points  (minimum  6,  recommended  12)  to  be 
sighted  by  one  user,  and  stereo  must  be  used  to  judge  and 
align  correctly  the  depth,  which  is  a  difficult  task. 

The  structure  of  this  paper  is  as  follows.  Section  2  de¬ 
fines  the  notation  and  discusses  the  problem  statement  in 
detail.  In  Section  3  we  consider  the  problem  of  calcu¬ 
lating  the  sensor-to-manipulator  transformation  when  the 
world-to-base  transformation  is  assumed  to  be  known.  Sec¬ 
tion  4  extends  this  to  the  case  when  neither  the  sensor- 
to-manipulator  nor  the  world-to-base  transformations  are 
known.  The  implementation  of  this  framework  is  outlined 
in  Section  5  and  a  set  of  results  for  a  test  case  are  given  in 
Section  6.  Summary  and  conclusions  are  given  in  Section  7. 

2.  Problem  Statement 

Throughout  this  paper  we  use  the  following  notation.  Fet 
a  referential  (or  rigid  coordinate  system)  X  be  written  as 
X.  Fet  XY  be  the  motion  or  homogeneous  transformation 
that  aligns  the  X  referential  with  a  second  referential  Y'. 
Furthermore,  from  the  properties  of  inverses  of  transforma¬ 
tions,  YX  =  (XY)"\ 

The  problem  of  aligning  the  head  mounted  display  with 
the  graphics  system  is  illustrated  in  Figure  2.  A  user  wears 
a  tracked  head  mounted  display.  The  part  of  the  head 
mounted  display  responsible  for  generating  the  graphics  is 
fixed  to  the  manipulator  with  referential  M.  A  sensor  with 
referential  S  is  rigidly  attached  to  the  headband  of  the  head 
mounted  display.  The  sensor  base  (origin  of  the  tracking 
system)  is  B.  Therefore,  the  tracking  system  actually  mea¬ 
sures  BS.  To  render  the  graphics  properly,  the  attitude  of 
the  graphics  display  in  the  world  (WM)  must  be  known. 

From  the  figure,  these  transformations  are  given  by 

WM  =  WB  .  BS  .  SM.  (1) 

In  other  words,  WM  can  be  calculated  if  the  world-to- 
base  (WB)  and  sensor-to-manipulator  (SM)  transforma¬ 
tions  are  known.  In  some  situations  these  quantities  can  be 

'in  other  words,  if  the  transformation  matrix  of  X  is  Mx  and  the  trans¬ 
formation  matrix  of  Y  is  My,  then  XY  = 


SM 


Figure  2.  The  referentials  and  transformations 
which  are  reievant  for  tracker  aiignment. 


determined  in  advance.  However,  in  some  circumstances  it 
can  be  difficult  to  determine  these  quantities  in  an  offline 
manner. 

The  sensor-to-manipulator  transformation  SM  can  be 
difficult  to  calculate  for  two  reasons.  The  first  is  that  it  can 
be  difficult  to  determine  the  point  at  which  the  measure¬ 
ment  is  made.  Sensing  devices  are  of  finite  size,  and  it  can 
be  difficult  to  work  out  the  point  within  the  sensor  which  is 
being  tracked.  The  second  is  that  the  sensor  and  manipula¬ 
tor  might  not  be  rigidly  attached  to  one  another.  The  Sony 
Glasstron,  for  example,  includes  a  hinged  joint  which  al¬ 
lows  the  display  to  be  translated  and  rotated  with  respect  to 
the  head  band.  To  counterbalance  the  weight  distribution  on 
the  user’s  head,  it  is  not  always  possible  to  attach  the  tracker 
to  the  display  itself  but  rather  to  the  headband.  Therefore, 
as  a  user  adjusts  a  display  (either  during  a  calibration  pro¬ 
cedure  or  even  during  normal  use),  SM  is  changed.  Even 
if  the  display  is  fixed  with  respect  to  the  sensor’s  mounting 
point,  the  sensor  might  be  installed  in  a  manner  such  that  its 
transformation  with  the  display  is  not  intuitive^. 

At  first  sight,  it  might  appear  that  calculating  the  world- 
to-base  transformation  WB  is  significantly  simpler.  The 
prevailing  assumption  appears  to  be  that  this  is  a  fixed,  easy 
to  identify  and  easy  to  measure  property.  As  a  result,  with 
careful  measurement,  the  transformation  can  be  calculated. 
However,  there  are  two  difficulties  with  this  approach.  The 
first  is  that  it  is  not  always  possible  to  accurately  measure 

^One  common  practice  is  to  try  to  mount  the  trackers  as  ‘horizontally 
as  possible  on  the  display”.  Assuming  the  tracker  is  horizontal,  only  the 
yaw  needs  to  be  corrected.  However,  in  general  the  sensor  will  not  be 
properly  aligned  and  this  leads  to  coupling  of  pitch  and  yaw  rotations  in 
highly  non  intuitive  ways. 


the  base  of  the  tracker  in  the  physical  world.  In  our  mo¬ 
bile  augmented  reality  work  indoor  users  (who  are  tracked 
with  by  an  InterSense  IS900)  must  be  able  to  see  and  inter¬ 
act  with  outdoor  mobile  users  (whose  positions  are  tracked 
using  a  GPS).  Therefore,  the  base  of  the  IS900  must  be 
expressed  in  world  fixed  (longitude/latitude)  coordinates. 
However,  it  is  not  immediately  obvious  how  this  can  be  cal¬ 
culated^.  A  second  problem  is  that  some  tracking  systems 
simply  do  not  have  a  tangible,  physical  source  which  cor¬ 
responds  to  the  tracker  base.  The  InterSense  InertiaCube 
2  (IC2),  for  example,  utilizes  magnetometers  to  constrain 
the  yaw  of  an  orientation  tracker.  The  base  of  the  magne¬ 
tometer  is  magnetic  north.  However,  as  is  well  known,  local 
magnetic  anomalies  can  distort  the  magnetic  field.  In  effect 
the  sensor  base  changes  as  the  tracker  moves  through  the 
environment. 

We  now  describe  two  calibration  procedures.  The  first 
calculates  SM  under  the  assumption  that  WB  is  known. 
The  second  generalizes  this  result  to  determine  both  WB 
and  MS. 

3.  Single  Point  Calibration  Technique 

The  single  calibration  point  technique  uses  a  single  mea¬ 
surement  to  calculate  SM.  The  technique  is  built  on  the 
observation  that  SM  can  be  calculated  by  inverting  Equa¬ 
tion  1: 

SM  =  SB  .  BW  .  WM.  (2) 

The  difficulty  with  this  approach  is  that  the  true  loca¬ 
tion  of  manipulator,  expressed  in  world  coordinates,  must 
be  known.  Because  we  are  using  an  optical  see-through 
display,  this  can  be  achieved  by  asking  the  users  themselves 
to  align  the  contents  of  the  display  directly  with  objects  in 
the  environment. 

The  calibration  procedure  is  illustrated  in  Eigure  3:  the 
environment  contains  two  calibration  points  —  a  reference 
point  on  the  ground  (C)  and  a  calibration  mark  on  the  wall 
(G).  The  locations  of  these  marks  must  be  known  within  the 
coordinate  system  of  the  model.  The  display  renders  graph¬ 
ics  as  if  the  head  mounted  display  was  placed  at  a  known 
value  of  WM.  The  user  is  asked  to  stand  on  C  and  align  the 
contents  of  the  display  with  G.  When  the  two  are  aligned, 
the  user  has  positioned  the  display  at  the  known  WM  loca¬ 
tion  and  a  tracker  sample  (a  sample  of  SB)  is  recorded. 

The  transformation  WM  can  be  decomposed  into  a 
transformation  from  the  world  to  C,  and  from  C  to  M: 

WM  =  WC  .  CM.  (3) 

Under  the  assumption  that  C  is  not  rotated^,  WC  is  of  the 

^Because  the  tracking  system  is  mounted  indoors,  GPS  cannot  be  used 
to  directly  measure  position. 

■^This  assumption  is  valid  because  C  specifi  es  the  location  of  the  user’s 
feet.  The  orientation  of  the  user’s  head  is  determined  by  CM. 


SM 


Figure  3.  The  single  point  calibration  tech¬ 
nique.  Assuming  that  the  world-to-base 
transformation  is  known,  the  sensor-to- 
manipulator  transformation  can  be  calculated 
by  asking  the  user  to  align  the  contents  of  the 
display  with  appropriate  objects  in  the  envi¬ 
ronment. 


form: 


WC 


10  0 
010  Uwc 

0  0  1  Z'fjuc 

0  0  0  1 


(4) 


To  calculate  CM  we  use  three  assumptions.  The  first 
assumption  is  that  the  user  looks  directly  at  G.  Therefore, 
when  the  display  is  correctly  aligned  with  the  real  world, 
G  is  projected  into  the  center  of  the  screen^.  The  second 
assumption  is  that  that  the  tracker  is  aligned  such  that  the 
roll  component  is  zero.  The  third  assumption  that  the  height 
of  the  tracker  off  the  ground,  H,  is  known. 

Decomposing  CM  into  a  pure  translation  and  a  pure  ro¬ 
tation. 


CM  —  Tcm  ■  Ocm-  (5) 


The  first  component  is  the  vertical  translation. 


T 


cm 


10  0  0 
0  10  0 
0  0  1  ff 
0  0  0  1 


(6) 


The  second  component  is  the  rotation  needed  to  align  the 
displays.  Using  the  assumption  that  the  roll  angle  is  zero, 

^This  is  only  true  if  the  display  is  monoscopic.  If  it  is  stereoscopic,  the 
projection  is  shifted  for  each  eye  according  to  the  eye  separation. 


this  can  be  decomposed  into  an  azimuth  rotation  ip  (about 
the  body-fixed  2-axis)  followed  by  an  elevation  rotation  (f) 
(about  the  body-fixed  a;-axis) 

Ocm  =  R-z  ■  R-a:-  (7) 

Ip  and  (p  can  be  calculated  from  MG.  MG  is  the  trans¬ 
formation  from  the  manipulator  to  the  calibration  mark.  The 
translation  {xmg,  Vmg,  zmg)  is  not  a  function  of  the  ori¬ 
entation.  Therefore, 

V:  =  tan-i 

\  Vmg  ) 

^  [zmg/ a^MG  +  Vmg  + 

The  resulting  procedure  is  simple  and  effective.  The  user 
is  merely  asked  to  stand  in  a  known  location  and  align  the 
contents  of  the  head  mounted  display  with  the  environment. 
We  have  extensively  used  it  in  our  own  system  and  have  ap¬ 
plied  it  to  many  demonstrations  using  the  system  at  many 
different  sites.  This  calibration  is  easy  and  allows  the  flex¬ 
ibility  to  place  the  sensor  anywhere  on  the  user  head  and 
with  any  orientation. 

However,  this  algorithm  relies  on  the  assumption  that 
CM  can  be  calculated.  In  the  approach  presented  here, 
this  is  equivalent  to  assuming  that  the  height  of  the  display 
off  the  ground,  H,  is  known.  There  are  two  ways  of  ad¬ 
dressing  this  problem.  The  first  is  to  attempt  to  measure 
the  user  height  accurately.  The  second  is  to  place  the  cali¬ 
bration  target  as  far  from  the  user  as  is  practical.  However, 
if  more  accuracy  is  required,  the  SPAAM  method  could  be 
used  [14].  Another  possibility  is  to  use  a  video  see-through 
display  looking  at  a  known  landmark.  In  this  specific  case, 
the  camera  is  collocated  with  the  graphics  referential  (the 
manipulator)  and  therefore  the  pose  of  the  camera  recov¬ 
ered  by  vision  tracking  during  the  calibration  phase  directly 
gives  the  pose  of  the  manipulator.  In  the  case  where  an  op¬ 
tical  see-through  display  must  be  used,  a  similar  approach 
could  be  done  by  rigidly  attaching  a  camera  to  the  display 
and  calibrating  the  camera  to  display  transformation  once 
for  all.  We  are  currently  working  on  such  a  calibration 
method®. 

®The  calibration  of  the  properties  of  an  optical  see-through  display 
(such  as  the  fi  eld  of  view  and  the  distortion)  are  done  using  a  calibration 
grid  on  which  graphical  patterns  seen  through  the  display  are  aligned  by  a 
user.  At  the  same  time,  a  camera  attached  to  the  display  can  use  the  same 
calibration  grid  to  locate  itself  with  respect  to  the  grid.  Since  one  of  the 
result  of  the  optical  calibration  of  the  display  is  to  determine  where  is  the 
focal  point  of  the  display,  the  transform  from  the  camera  focal  point  to  the 
manipulator  (or  graphical  referential).  Once  know,  if  we  use  some  known 
pattern  instead  of  the  the  calibration  cross  currently  used,  then  the  camera 
can  locate  itself  precisly  with  repect  to  the  pattern  at  calibration  time,  and 
consequently  the  graphics  referential  can  be  located.  In  effect  the  user  will 
not  be  performing  the  calibration  but  roughly  aligning  the  fi  eld  of  view  of 
the  camera  so  that  it  can  see  the  pattern.  The  camera  will  then  locate  itself 
with  respect  to  pattern  and  provide  and  precise  attitude  for  the  manipulator, 
much  precise  that  if  it  was  obtained  by  the  user. 


Figure  4.  The  Multi  Point  Calibration  tech¬ 
nique. 


Although  this  approach  is  effective  for  cases  where  the 
base  is  known,  it  cannot  be  used  when  the  pose  of  the  base 
is  unknown.  The  next  section  introduces  a  method  to  solve 
this  problem. 

4.  Multiple  Point  Calibration  Technique 

When  the  world-to-base  transformation  WB  is  not 
known,  the  single  point  calibration  technique  described  in 
the  last  cannot  be  used  because  Equation  2  cannot  be  eval¬ 
uated.  However,  the  necessary  information  can  be  gleaned 
from  looking  at  how  the  relative  transformations  which  oc¬ 
cur  when  the  user  calibrates  on  a  pair  of  calibration  points. 

Consider  the  situation  shown  in  Figure  4:  Uq  and  Ui  rep¬ 
resent  the  user  head  that  includes  the  sensor  S  and  manipu¬ 
lator  M  (which  in  this  specific  case  is  the  graphical  referen¬ 
tial).  Vo  represents  the  rigid  group  that  include  the  world  W 
and  the  base  of  the  tracking  system  B.  When  the  user  moves 
her  head  between  two  calibration  marks  (motion  from  Uq  to 
Ui),  the  sensor  produces  the  motion  SqSi  and  the  manipula¬ 
tor  produces  the  motion  MqMi  .  SqS i  and  MoMi  are  related 
through  the  transformation  SM  between  the  sensor  and  the 
manipulator.  The  problem  can  be  inverted  to  find  BW,  the 
transformation  between  the  base  and  the  world,  using  no 
additional  motion  from  the  user.  In  effect,  a  motion  of  the 
head  with  respect  to  the  world  can  be  seen  as  a  motion  of 
the  world  with  respect  to  the  head.  In  this  case,  the  head  is 
fixed  in  Uq  and  the  world  and  tracker  base  are  moving  from 
Vq  to  Vi.  In  this  case  BqBi  and  WqWi  are  related  through 
the  transformation  BW. 

More  formally,  suppose  the  user  stands  at  two  different 
locations  (Ci  and  C2)  and  looks  at  two  different  calibration 
marks  (Gi  and  G2).  The  sensor  referential  in  these  two 


locations  are  Si  and  S2,  and  the  manipulator  referential 
are  Mi  and  M2. 

Writing  out  Equation  2  for  each  measurement, 

SM  =  SiB  .  BW  .  WMi  (8) 

SM  =  S2B  .  B W  .  WM2 .  (9) 

Rearranging  Equation  8, 

BW^BSi.SM.MiW. 

Substituting  into  Equation  9, 

SM  =  S2B  .  BSi .  SM  .  MiW  .  WM2. 
Postmultiplying  both  sides  by  M2W  .  WMi  gives 

(SiB.BS2).SM  =  SM.(MiW.WM2).  (10) 

This  is  exactly  the  same  as  the  so-called  “hand-eye”  cal¬ 
ibration  framework  problem  which  is  frequently  encoun¬ 
tered  in  robotics  [12].  In  a  typical  robotics  application,  a 
manipulator  is  rigidly  attached  to  the  actuator.  The  transfor¬ 
mation  from  the  actuator  to  the  manipulator  is  not  known. 
However,  both  the  actuator  and  the  manipulator  contain 
tracking  systems.  In  a  typical  configuration,  the  actuator 
might  be  a  robotic  arm  (whose  geometry  is  known  and 
whose  joint  angles  are  measured)  and  the  manipulator  con¬ 
tains  a  camera.  The  problem  is  conventionally  posed  as 

A.X  =  X.B  (11) 

where  A  is  the  motion  of  the  first  referential,  B  is  the  mo¬ 
tion  of  the  second  referential,  and  X  is  the  transformation 
that  aligns  the  first  referential  with  the  second  one. 

If  A  and  B  could  be  measured  perfectly,  solving  this 
equation  would  be  a  trivial  linear  algebra  problem.  How¬ 
ever,  because  A  and  B  are  measured  by  noise-corrupted 
sensors,  more  sophisticated  techniques  must  be  used  to  en¬ 
sure  that  X  is  a  properly  formed  homogeneous  transfor¬ 
mation  matrix.  Within  the  robotics  literature,  a  number 
of  different  approaches  have  been  proposed.  For  this  pa¬ 
per  we  used  a  closed-form  solution  developed  by  Park  and 
Martin  [5]  .  This  solution,  described  in  detail  in  the  ap¬ 
pendix,  uses  Lie  Bracketing  Algebra  and  matrix  logarithms 
and  yields  an  extremely  compact  and  easy  to  implement  so¬ 
lution. 

A  similar  approach  can  be  taken  to  solve  WB.  Substi¬ 
tuting  Equation  8  into  Equation  9, 

SiB  .  BW  .  WMi  =  S2B  .  BW  .  WM2. 

Premultiplying  by  BS2  and  post  multiplying  by  MiW, 

(BS2.SiB).BW  =  BW.(WM2.MiW).  (12) 

Once  again,  this  is  in  the  form  of  the  ”hand-eye”  calibra¬ 
tion  problem  and  can  be  solved  in  exactly  the  same  man¬ 
ner’. 

^Another  way  to  consider  the  problem  is  that,  to  solve  for  SM,  we 


5.  Implementation 

The  calibration  framework  described  in  the  previous  sec¬ 
tion  was  implemented  within  the  Battlefield  Augmented  Re¬ 
ality  System  (BARS)  [9].  The  interactive  authoring  system 
described  in  [3]  was  extended  to  allow  users  to  annotate  an 
environment  model  with  a  set  of  N  calibration  cross  refer- 
entials  Ci  and  calibration  marks  G^.  SM^  is  calculated  from 
Equation  3  for  each  Ci /Gi  pair. 

When  solving  Equations  10  and  12,  it  is  possible  to  ex¬ 
ploit  the  fact  that  Equations  8  and  9  apply  for  any  pair  of 
relative  transformations.  Eor  example,  when  solving  Equa¬ 
tion  10,  it  is  possible  to  construct  N{N  —  1) /2  equations  of 
the  form 

(S,B  .  BSj) .  SM  =  SM .  (M,W  .  WM^-) . 

where  i,j  G  [1,  •  ■  • ,  N]  and  i  ^  j.  As  shown  in  the  ap¬ 
pendix,  this  can  greatly  increase  the  performance  of  the  so¬ 
lution. 

We  now  demonstrate  the  use  of  this  approach  at  a  test 
environment. 

6.  Example 

The  alignment  framework  was  used  to  align  the  sensors 
in  an  indoor  mobile  augmented  reality  system.  The  geo¬ 
metric  model  of  the  environment  is  shown  in  Eigure  5  — 
the  environment  consists  of  a  single  room  and  a  number  of 
pieces  of  laboratory  equipment.  The  model  has  been  aug¬ 
mented  to  include  the  calibration  crosses  (the  nth  cross  is 
labeled  AXXBn)  and  the  calibration  marks.  Most  of  the  cal¬ 
ibration  marks  in  this  model  are  preexisting  features  such 
as  the  corners  of  doors  or  walls.  One  artificial  calibration 
mark,  a  cross,  can  be  see  on  the  right  of  the  picture. 

A  set  of  7  calibration  crosses  and  calibration  marks  were 
created.  To  test  the  effectiveness  of  this  configuration,  a 
sensitivity  analysis  was  performed  using  a  Monte  Carlo 
analysis.  It  was  assumed  that  the  measurement  error  (which 
includes  tracker  error  and  misalignment  errors  by  the  user) 
has  a  standard  deviation  of  0.05m  in  position  and  0.5  de¬ 
grees  in  orientation*.  Eigures  6(a)  and  6(b)  show  the  2a 
standard  deviation  of  the  error  in  SM.  These  plots  show 
that  the  2a  error  in  SM  is  between  0.05m  (X  and  Z)  and 
0.15m  (Y),  and  the  orientation  error  is  between  0.5°  (X  and 
Y)  and  1.2°  (Z).  We  believe  that  the  errors  in  Y  (for  posi¬ 
tion)  and  Z  (for  orientation)  are  much  larger  as  a  result  of 
the  calibration  configuration  which  was  used  —  namely  that 
all  of  the  calibration  marks  were  at  approximately  the  same 

assume  that  the  tracker  base  B  is  fi  xed  and  allow  the  manipulator  M  to 
move.  To  solve  for  WB,  we  assume  that  M  is  fi  xed  and  allow  B  to  move. 

^Positions  are  given  in  metres.  Orientations  are  expressed  in  Euler  an¬ 
gles  using  rotation  about  fi  xed  XY Z  axes  (the  same  convention  is  used  by 
Java3D). 


Figure  5.  The  sample  calibration  environment. 


height.  The  simulation  studies  confirm  that,  with  a  more 
uniform  distribution  of  marks  in  three-dimensions,  the  er¬ 
rors  in  all  rotation  angles  and  positions  decrease  at  approx¬ 
imately  the  same  rate. 

The  figures  also  illustrate  that,  as  the  number  of  com¬ 
binations  increase,  the  magnitude  of  the  error  is  reduced. 
Eigures  6(c)  and  6(d)  show  the  standard  deviations  of  the 
errors  in  translation  and  orientation  of  WB.  The  results 
are  very  similar  to  those  for  SM;  the  error  on  the  position 
were  ranging  between  0.35  and  0.7m,  and  the  error  on  the 
orientation  were  ranging  between  0.4°  and  1.2°. 

These  measurements  were  confirmed  by  conducting  an 
actual  calibration  experiment.  The  tracking  system  is  an  In- 
terSense  IS900LAT.  The  user  stood  at  each  of  the  7  calibra¬ 
tion  points  and  was  asked  to  align  the  display  with  the  ap¬ 
propriate  calibration  mark.  Through  careful  (and  laborious) 
measurement,  the  value  of  WB  was  accurately  obtained. 
Table  1 .  As  can  be  seen,  the  results  are  extremely  accurate 
for  almost  all  results  and  are,  in  fact,  significantly  better 
than  those  predicted  by  the  covariance  analysis.  In  this  ex¬ 
perimental  configuration,  SM  could  not  be  accurately  mea¬ 
sured  independently.  However,  because  observed  registra¬ 
tion  errors  in  the  calibrated  display  were  small,  we  believe 
that  it  was  estimated  accurately. 

7.  Conclusions 

In  this  paper  we  have  presented  a  method  for  aligning 
trackers  in  augmented  reality  systems.  The  approach  de¬ 
scribed  here  is  novel  in  two  respects.  Eirst,  the  alignment 
process  is  extremely  easy  and  intuitive  —  a  user  is  asked  to 
stand  at  a  known  location  and  align  the  display  with  known 
objects  in  the  environment.  Second,  the  method  is  capable 


Position  error  standard  deviation  (m)  Position  error  standard  deviation  (m) 


Number  of  combinations  Number  of  combinations 


(a)  Translation  error  in  SM.  (b)  Orientation  error  in  SM. 


Number  of  combinations  Number  of  combinations 


(c)  Translation  error  in  BW.  (d)  Orientation  error  in  BW. 


Figure  6.  The  error  in  the  caicuiated  vaiues  of  SM  and  BW  as  a  function  of  the  number  of  combina¬ 
tions  used.  Each  figure  piots  the  2a  (95%  probabiiity)  error  bounds. 
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BW 
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0.533 

0.53 
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-180 

-179.913 

6 

0 

0.774 

90 

90.466 

Table  1 .  The  calculated  and  true  values  of  the 
base-to-world  transformation. 


of  calculating  the  sensor-to-manipulator  and  world-to-base 
transformations  in  the  same  step. 

We  shall  extend  this  calibration  method  in  the  following 
ways: 

•  Develop  tools  that  will  provide  user  with  feedback  in 
designing  a  calibration  scheme.  For  example,  the  algo¬ 
rithm  we  used  to  solve  the  hand-eye  calibration  prob¬ 
lem  cannot  be  applied  if  the  trace  of  any  transforma¬ 
tion  matrix  is  -1.  Such  conditions  can  be  detected  as 
the  marks  and  crosses  are  being  created  and  surveyed 
in. 

•  Explore  schemes  to  automatically  optimize  the  place¬ 
ment  of  calibration  marks  to  improve  the  accuracy  of 
the  calibration  result,  based  on  using  this  tool  and  un¬ 
derstanding  in  which  situations  the  logarithm  fails. 

•  Explore  the  effect  of  other  solvers  to  see  how  these  are 
affected  by  noise  and  /  or  marker  placement. 

•  Use  this  algorithm  to  solve  the  relative  placement  of 
two  trackers  used  to  form  an  hybrid  tracker. 

A.  Solving  the  Calibration  Equations 

The  calibration  framework  relies  on  the  ability  to  solve 
the  equation 

A.X  =  X.B  (13) 

where  A,  B  and  X  are  transformation  matrices.  This  prob¬ 
lem  is  extremely  important  in  the  field  of  robotics  where  it 
is  known  as  the  “hand-eye”  calibration  problem.  Given  a 
set  of  N  measurements  of  A  and  B,  find  X  such  that 

Ai.X  =  X.Bi 

Am.X  =  X.Bat 

A  number  of  different  solutions  have  been  developed  for 
this  problem.  Most  of  these  solutions  are  iterative,  and  are 


typically  designed  for  automatic  systems  where  many  hun¬ 
dreds  of  samples  can  be  taken.  Accurate  solvers  which  re¬ 
quire  few  measurements  are  extremely  important. 

Eor  this  paper,  we  used  an  approach  which  was  devel¬ 
oped  by  Park  and  Martin  in  [5].  Despite  the  theoretically 
complexity  of  the  algorithm  (it  is  based  on  the  matrix  log¬ 
arithm  of  the  transformation  matrix)  it  is  extremely  easy  to 
implement. 

Let  0  e  SO{3)  be  any  rotation  matrix  and  let  b  G  be 
the  translation.  Therefore,  any  valid  transformation  matrix 
M  has  the  form 


If  trace  [0]  ^  —  1,  the  logarithm  of  this  matrix  is 


logM 


M  A-^h 
0  0 


where  [w]  =  log  0  and  A  is  a  matrix  whose  is  irrelevant  for 
solving  the  calibration  problem. 

Let  (f)  be 


'  =  cos 


/  trace  [0]  —  1 


The  matrix  logarithm  [w]  is 


(14) 


'  2  sin  (j) 

This  is  a  skew  symmetric  matrix 


0-0^ 


[u;] 


0  — W3  W2 
a;3  0  — wi 

—L02  0 


(15) 


(16) 


Therefore,  \uj]  can  be  parameterised  as  the  vector  u) 
where 

r^ii 

UJ2 
W3 


Let  OLi  be  the  matrix  logarithm  of  measurement  A^  and 
/3j  be  the  matrix  logarithm  of  measurement  B^. 

The  Park-Martin  algorithm  [5]  attempts  to  find  X 


X  = 


®x 

0 


bjc 

1 


The  algorithm  decomposes  the  solution  into  two  sub¬ 
problems.  The  first  is  to  calculate  the  rotation  of  0jf  •  This 
can  be  carried  out  independently  of  the  translations.  The 
second  problem  calculates  hx  using  the  calculated  value  of 

0jf 

The  rotation  matrix  0x  is  chosen  to  minimise  the  cost 
function 

p 

(17) 

2=1 


The  optimal  solution  is 

0x  =  (18) 

where 

M  =  ^/3,.af.  (19) 

i=l 

If  p  =  2,  the  third  measurements  are  synthesised  as 
0:3  =  ai  X  0.2  and  f3^  =  f3i  x  (32- 

The  matrix  M  has  the  property  that  it  is  always  guaran¬ 
teed  to  be  orthonormal  even  if  the  data  is  noisy. 

The  second  optimisation  solution  minimises 


p 

1I2  =  ^  II  —  I)  ■  bx  —  0x  ■  bs-  +  bxjp.  (20) 

i=l 

This  can  be  expressed  as  a  standard  least  squares  min¬ 
imisation  problem  and  its  solution  is 


where 


and 


hx  =  (C^.C)”\c^d 

I  -  ®A, 

I  -  ©A, 

[bxi  —  0x  ■  bsj 


C  = 


d  = 


b3„  —  0x  ■  bf 


This  equation  can  be  solved  even  if  only  2  measurements 
are  used. 
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